Robust Subspace Outlier Detection in High Dimensional Space
نویسنده
چکیده
Rare data in a large-scale database are called outliers that reveal significant information in the real world. The subspace-based outlier detection is regarded as a feasible approach in very high dimensional space. However, the outliers found in subspaces are only part of the true outliers in high dimensional space, indeed. The outliers hidden in normalclustered points are sometimes neglected in the projected dimensional subspace. In this paper, we propose a robust subspace method for detecting such inner outliers in a given dataset, which uses two dimensional-projections: detecting outliers in subspaces with local density ratio in the first projected dimensions; finding outliers by comparing neighbor’s positions in the second projected dimensions. Each point’s weight is calculated by summing up all related values got in the two steps projected dimensions, and then the points scoring the largest weight values are taken as outliers. By taking a series of experiments with the number of dimensions from 10 to 10000, the results show that our proposed method achieves high precision in the case of extremely high dimensional space, and works well in low dimensional space. Keywords-Outlier detection; High dimensional subspace; Dimension projection; k-NS;
منابع مشابه
Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data
We propose an original outlier detection schema that detects outliers in varying subspaces of a high dimensional feature space. In particular, for each object in the data set, we explore the axis-parallel subspace spanned by its neighbors and determine how much the object deviates from the neighbors in this subspace. In our experiments, we show that our novel subspace outlier detection is super...
متن کاملA Robust Method for Detecting DB-Outliers from High Dimensional Datasets
Outlier detection is a popular technique that can be utilized in many modern applications like financial analysis and fraud detection. As data description becomes complex, operated datasets’ dimensionalities keep monotone increasing. However, current researches find that it is extremely difficult to pick out outliers directly from high dimensional datasets owing to the curse of dimensionality. ...
متن کاملExample-Based DB-Outlier Detection from High Dimensional Datasets
Outlier detection is an important problem that has applications in many fields. High dimensional datasets are common in such applications. Among the existing outlier detection methods, Distance-Based outlier (DB-Outlier) detection is one of the most generalizable and simplest approaches. It finds outliers by calculating distances between data points. However, in high dimensional space, data dis...
متن کاملA Novel Subspace Outlier Detection Approach in High Dimensional Data Sets
Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in s...
متن کاملDetecting High-Dimensional Outliers: the New Task, Algorithms and Performance
Outlier detection is a fundamental step in knowledge discovery in databases. With the increasing number of high-dimensional databases, existing outlier detection algorithms that work only in the context of full space are unable to effectively screen out informative outliers. This is because majority of these outliers exists only in subspaces. In this paper, we identify a new outlier detection t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1405.0869 شماره
صفحات -
تاریخ انتشار 2014